Skip to content

feat(kernelgen): import NKIPyKernelGen as a subfolder#55

Open
shaojiex-aws wants to merge 1 commit intoaws-neuron:feat/kernelgenfrom
shaojiex-aws:feat/kernelgen
Open

feat(kernelgen): import NKIPyKernelGen as a subfolder#55
shaojiex-aws wants to merge 1 commit intoaws-neuron:feat/kernelgenfrom
shaojiex-aws:feat/kernelgen

Conversation

@shaojiex-aws
Copy link
Copy Markdown

Import the open_source branch of NKIPyKernelGen into kernelgen/ as a self-contained subpackage. NKIPyKernelGen is a compiler that traces NumPy functions and lowers them to NISA (Neuron Instruction Set Architecture) for AWS Neuron hardware. Users write kernels in Python with @trace and knob.knob() annotations; the compiler handles tiling, memory placement, layout legalization, and NISA lowering.

What's included

  • kernelgen/nkipy_kernelgen/ — Python tracing frontend:
    • trace.py (@trace decorator)
    • knob.py (tensor annotations: mem_space, tile_size, reduction_tile, partition_dim)
    • traced_array.py (TracedArray wrapping MLIR SSA values)
    • op_vtable.py (NumPy op → MLIR lowering table)
    • transforms/nkipy_opt.py (pipeline orchestration, shells out to nkipy-opt)
  • kernelgen/mlir/ — MLIR dialect + C++ passes:
    • nkipy.annotate op (target, mem_space, partition_dim, tile_size,
      reduction_tile)
    • 20+ transformation passes under mlir/lib/Transforms/ implementing
      the 24-pass compilation pipeline (InferLayout, KnobDrivenTiling,
      AnnotateMemorySpace, LegalizeLayout, InsertSpillReload,
      LinalgToNisa, etc.)
  • kernelgen/tests/ — test suite:
    • passes/ — per-pass FileCheck tests
    • e2e/ — end-to-end tests (trace → NISA → BIR sim / HW)
    • unit/ — Python-level unit tests
    • harness.py — unified test harness with LLVM/BIR_SIM/HW/FileCheck
      modes
  • kernelgen/examples/ — example kernels
  • kernelgen/compiler_explorer/ — Compiler Explorer wrapper for inspecting
    IR at any pipeline stage
  • kernelgen/setup.py, pyproject.toml, pytest.ini, requirements.txt
    — build + test configuration (pip install -e kernelgen/ builds the
    C++ passes via CMake)
  • kernelgen/CLAUDE.md, README.md — pipeline docs and usage notes

Architecture notes

NKIPyKernelGen depends on the NISA dialect defined in private-nki-staging (the nki wheel). NKIPyKernelGen's nkipy-opt binary performs the tensor-level and bufferization phases; lowering to BIR then runs through the upstream nki-opt-pipeline. This import does not bring in the NISA dialect sources — only NKIPyKernelGen's own passes and frontend.

Ignore rules

Added a !mlir/lib/ override in kernelgen/.gitignore so the parent nkipy repo's lib/ rule (intended for Python venv lib/ dirs) does not silently exclude the MLIR C++ pass sources under kernelgen/mlir/lib/.

Source

Imported from NKIPyKernelGen open_source branch @ commit 973c1be ("fix: correct mem_space enum values in builder.annotate()"). Internal git history is not preserved — this is a single squash import for the open-source release.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Import the open_source branch of NKIPyKernelGen into `kernelgen/` as a
self-contained subpackage. NKIPyKernelGen is a compiler that traces NumPy
functions and lowers them to NISA (Neuron Instruction Set Architecture)
for AWS Neuron hardware. Users write kernels in Python with `@trace` and
`knob.knob()` annotations; the compiler handles tiling, memory placement,
layout legalization, and NISA lowering.

What's included
---------------
- `kernelgen/nkipy_kernelgen/`  — Python tracing frontend:
    - `trace.py` (@trace decorator)
    - `knob.py` (tensor annotations: mem_space, tile_size, reduction_tile,
      partition_dim)
    - `traced_array.py` (TracedArray wrapping MLIR SSA values)
    - `op_vtable.py` (NumPy op → MLIR lowering table)
    - `transforms/nkipy_opt.py` (pipeline orchestration, shells out to
      `nkipy-opt`)
- `kernelgen/mlir/`             — MLIR dialect + C++ passes:
    - `nkipy.annotate` op (target, mem_space, partition_dim, tile_size,
      reduction_tile)
    - 20+ transformation passes under `mlir/lib/Transforms/` implementing
      the 24-pass compilation pipeline (InferLayout, KnobDrivenTiling,
      AnnotateMemorySpace, LegalizeLayout, InsertSpillReload,
      LinalgToNisa, etc.)
- `kernelgen/tests/`            — test suite:
    - `passes/` — per-pass FileCheck tests
    - `e2e/`    — end-to-end tests (trace → NISA → BIR sim / HW)
    - `unit/`   — Python-level unit tests
    - `harness.py` — unified test harness with LLVM/BIR_SIM/HW/FileCheck
      modes
- `kernelgen/examples/`         — example kernels
- `kernelgen/compiler_explorer/` — Compiler Explorer wrapper for inspecting
  IR at any pipeline stage
- `kernelgen/setup.py`, `pyproject.toml`, `pytest.ini`, `requirements.txt`
  — build + test configuration (`pip install -e kernelgen/` builds the
  C++ passes via CMake)
- `kernelgen/CLAUDE.md`, `README.md` — pipeline docs and usage notes

Architecture notes
------------------
NKIPyKernelGen depends on the NISA dialect defined in private-nki-staging
(the `nki` wheel). NKIPyKernelGen's `nkipy-opt` binary performs the
tensor-level and bufferization phases; lowering to BIR then runs through
the upstream `nki-opt-pipeline`. This import does not bring in the NISA
dialect sources — only NKIPyKernelGen's own passes and frontend.

Ignore rules
------------
Added a `!mlir/lib/` override in `kernelgen/.gitignore` so the parent
nkipy repo's `lib/` rule (intended for Python venv `lib/` dirs) does not
silently exclude the MLIR C++ pass sources under `kernelgen/mlir/lib/`.

Source
------
Imported from NKIPyKernelGen `open_source` branch @ commit 973c1be
("fix: correct mem_space enum values in builder.annotate()"). Internal
git history is not preserved — this is a single squash import for the
open-source release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants